Abstract: In online aggregation, a database system processes a user’s aggregation query in an online fashion. During the query processing, the system gives the user an estimate of the final query result, with the confidence bounds that become tighter over time.Map-Reduce programming approach have close relationship with cloud computing. Today, online aggregation is a very attractive technology. In this I have described how online aggregation can be built into a Map-Reduce system for large-scale data processing. In this I also describes the detail implementation of OLA models in Hayracks . In literature survey section we have briefly discussed various online aggregation methodology such as OATS,COLA , Parallel Online Aggregation with their advantages and limitations. Lastly, I have presented advantages and limitation of OLA. Online Aggregation is an attractive sampling-based technology to response aggregation queries by an estimate to the final result, with the confidence interval becoming tighter over time. It has been built into a Map-Reduce-based cloud system for big data analytics, which allows users to monitor the query progress, and save money by killing the computation early once sufficient accuracy has been obtained. However, there are several limitations that restrict the performance of online aggregation generated from the gap between the current mechanism of Map-Reduce paradigm and the requirements of online aggregation, such as: 1) The low sampling efficiency due to the lack of consideration of skewed data distribution for online aggregation in Map-Reduce. 2) The large redundant I/O cost of online aggregation caused by the independent job execution mechanism of Map-Reduce.
Keywords: Cloud, Hadoop ,Map-Reduce, Hayracks , Online Aggregation.